R Markdown is a tool for making dynamic documents with R which combines markdown, a lightweight markup language that is an easy-to-write plain text format, and sections or chunks of embedded R code. This is powerful as is allows you to write reports or presentations that contain your R code.
Markdown is a flexible and lightweight tool for document writing. Two good sources of picking up the basic syntax are the markdown guide and the markdown tutorial. You can use these as points of reference throughout this tutorial.
There are two main reasons for writing reports or presentations in R Markdown instead of LaTeX: simplicity, and the ability to include R code. Markdown is a very easy to read and write markup language (LaTeX and HTML or other markup languages), which makes your life easier when writing documents. You can supplement your documents with LaTeX or HTML if needed. The inclusion of R code chunks makes R Markdown very flexible, with a lot of people using it not just for reports but for there general analysis work as well; it brings together all elements of an analysis which is really valuable.
There are very few disadvantages to using R Markdown over just LaTeX. One disadvantage is you need to use R and RStudio, which not everyone will have installed or feel comfortable using.
We should also note that R Markdown is a flavour of Markdown. Markdown can work with a wide range of programming languages from R, Python, C++, JavaScript, Bash, and SQL; all of which also work within RStudio!
For this workshop we will be using Quarto, which is a new (2022) piece of software built on R Markdown and designed to bring some quality of life improvements from R Markdown. Quarto works in the same way as R Markdown, so if you know one you will know both.
Artwork by @allison_horst
By the end of this workshop you will:
It might feel like there is a lot to explore. However, we split it into separate sections to make it easier for you to work through. We hope you enjoy it!
For this workshop you will need the following software and packages.
Make sure you have R and RStudio installed. If you haven’t, first install R and then RStudio.
As we are using Quarto as our markdown tool, there are a few steps to go through to get it up and running.
We will need to install Quarto, select the installer which matches your device (Windows or Mac). Then follow the installation instructions. If installing on the library computers, just install it for your own profile
Next you will need to install tinytex. Depending on if you are on a Mac or a Windows the series of images below will guide you through this.
First step on a windows machine is to search for terminal or command prompt in the search bar at the bottom right of your screen. Then open command prompt.
Once the command prompt is open, type in or copy the following command: quarto install tinytex
An install programme should run, when completed you should see a message saying installation complete (or similar). This means you’ll be ready to go to the next step.
The first step on a mac is to select the go menu on the top bar -> then utilities (or use shift+command+U)
When in the utilities screen, select the terminal to open it.
Once the terminal is open, type the following command in install tinytex: quarto install tinytex
An install programme should run, when completed you should see a message saying installation complete (or similar). This means you’ll be ready to go to the next step.
Once you have installed Quarto and tinytex, open RStudio run the below command to make sure the package is installed and loaded.
install.packages("quarto")
library(quarto)We are now up and ready to write some pdf reports!
When you open a Quarto document, which has the extension of
.qmd you will see an interface like this.
The grey areas are the code chunks. To run code cells you click on
the green play button or use control/command + enter
The white space is where we write our text. We can use markdown formatting here.
The blue # are headers. One # means top header, two # means second header and so on. We use these to make sections in our document.
The top part (the one that starts with title and ends with output between —) is written in a so-called YAML language. Here we tell Quarto what output we want, who we are and so on; this creates metadata that is used to compile your documents.
You recently converted a cool article about the gender pay gap in the UK to LaTeX, which was really fun!
The author of the article was feeling generous and also sent you the code and data used to make the figures and tables in the report. This made you wonder ‘can I do this report all in one with the code and text?’…then you remember that with R Markdown you can do just this! You decide it would be fun to convert the same article but this time use R Markdown to see the difference. Anyway, learning new skills is fun!
Feeling organised you collate everything you need into a folder which contains:
gender_pay_gap.pdfreport_template.qmdreferences.bibpay_gap_bot.pnganalysis that contains the following:
paygap.pngpaygap.R and
sector_averages.Rdata that contains:
gb-grid (this is used to make the paygap.png
image)paygap_sector_averages.csvpostcode_polygons.gpkgClick link to access the files
report_template.qmd file.You’ll notice a few things. First is the section at the top that has three dashes and some text like output, author etc. This is the YAML (YAML Ain’t Markup Language) header.
YAML in markdown is like the preamble section in LaTeX where we define what our document will look like. There are a lot of options and changes we can make. For now use this template and play around with it later!
Before going onto the next task, press the knit button (or press shift+command/control+k) to compile your document so you can see what it looks like.
We have a document template, first thing we want to do is to build up that title page!
Using the gender_pay_gap.pdf file as your example:
Hint: everything you change here is in the YAML header.
Note: the date will find today’s date for you, which is handy. If you wanted to change it manually you can also do that.
Our title page is looking good!
Next we set up the contents page which should include a table of contents, list of figures, and list of tables. We might not have added tables or figures yet, but we will soon.
Do not hesitate to search for ways to do it online!
Hint: To match the document exactly you’ll need to use LaTeX, adding the LaTeX code after the YAML header (the three dashes at the top of the page). If you get stuck check out the LaTeX documentation via Overleaf. You can also do this with Quarto, and it is explained in their documentation, LaTeX is simpler though in this case.
For the next tasks we will need to use the resources the author sent us.
I recommend the following:
report_template.qmd file and all the rest of
the resources files therereport_template.qmd again and you’re good
to goWhen you can see the files in your RStudio file viewer, click on one
of them like the pay_gap_bot.png file to open it. If it
opens, you’re good to move on!
We are so organised! Now we can start adding the real content to the document.
pay_gap_bot.png image file
in your projectHint: You can use markdown syntax to add the image, but to add a
caption I’d recommend using knitr for this. See this
page for an example about how to do this (last example with R
markdown logo). The out.width and fig.align
are known as chunk options and determine what the output of a
code chunk looks like.
Note: that to make the image the same as the example something like
this in the chunk options will work: out.width = '70%'
This section is strictly extra information which might be helpful in your future document writing endeavours. Feel free to skip over it and move on to task 5.
R Markdown works as a combination of plain text, markdown, and code chunks. Code chunks are anything that has two sets of the three ```. This is true for all markdown types and is usually shown as a different colour such as grey. The cool thing about R Markdown is that is allows you to run these code chunks interactivity. To insert a new code chunk in RStudio use either Ctrl+Alt+I for Windows or Cmd+Option+I for Mac.
The syntax for a code chunk is broken down as follows:
Below are two examples where we use some code to make a basic figure. One has chunk options, the other does not. Without chunk options we see everything. This is useful for showing collaborators, colleagues, and friends what you’ve done, but it is less good when you want a tidy report.
set.seed(19)
x <- runif(100)
y <- runif(100)
plot(
x = x, y = y,
col = ifelse(y > 0.6 | x < 0.3, '#2BB3F1','#F26B2C'),
pch = 19,
main = "Example plot"
)my interesting caption
The options used are listed below, which have to go within the curly brackets.
{r my-plot}{fig.align="center" and fig.cap="my interesting caption"}{out.width="50%"}{echo=FALSE}You end up with chunk options that look like:
{r my-plot, fig.align="center", fig.cap="my interesting caption", out.width="50%", echo=FALSE}
Some side notes on the side note is that there are other ways to define chunk options. Rather than write them here, the author of R Markdown has written about these in his excellent documentation and the Quarto documentation details how chunk options can be different in Quarto to R markdown.
If you have not done so already knit your document again so we can see the changes we made! Remember we press the knit button (or press shift+command/control+k) to compile our document with R Markdown. You should see that the list of figures has appeared with an entry, yay!
The methods section has more new elements for us to try out such as equations and citations.
Equation hint 1: you can use LaTeX or markdown syntax to write equations.
Equation hint 2: There are various shortcuts
available to make your equation writing easier. If I wanted to write the
below equation I’d need to do $$ {2 \over 4} \times 5 $$.
\[{2 \over 4} \times 5\]
Reference hint 1: there are two ways to cite in R
Markdown using either @author or
[@author].
Reference hint 2: each reference in the
references.bib file has a label which you use within the cite command
like @ggtext.
The template, rather helpfully, has been set up to handle references and citations. When using R Markdown, to use references you just need to add the following two lines in the YAML header.
bibliography: "references.bib"
biblio-style: apsr
As soon as you add in a reference, the references page will appear at the bottom of your document. You can change the referencing style to whatever you need and the R Markdown book or Quarto documentation provide a guide to this.
When you re-knit your document you should see a reference page at the bottom!
However, you notice that the colour of the citations, and the links you added, look different to the pdf document we are trying to replicate. How do we fix this irregularity?
Luckily this is straightforward in R Markdown and can be changed in
the YAML header. Looking in the YAML header in the template you notice
urlcolor and linkcolor which you suspect need
to be changed.
Now our links are sorted, we can do the results section which has some new elements for us to think about including a table, footnotes and figure referencing.
paygap_sector_average.csv to the table converter to get
your tablepaygap.png image file. Be
sure to pay attention to the positioning of the figure, you might need
to try different positions till it looks right\ref{fig:chunk-label}Figure hint 1: the figure label in R Markdown will be the code chunk
label if you use the knitr::include_graphics method.
Figure hint 2: figure position is determined by a parameter in the
chunk options, for example:
{r, chunk-label, out.width="80%, fig.pos = "p"}
This section is strictly extra information which might be helpful in your future document writing endeavours. Feel free to skip over it and move on to task 8.
We have gone for the most basic and simple option here when adding a table, but there are many other options at our disposal because we are using R Markdown. We’ve listed these options below for you to look at in your own time.
knitr packageI’ve added two examples of what we can do using
kableExtra, all code examples are based on the examples
from the kableExtra
documentation.
This first example brings in the csv file we used for the table, then
we display it using kableExtra.
# load in kableExtra
library(kableExtra)
# load in csv
sector_avg <- read.csv("paygap_sector_averages.csv")
# make table with kableExtra
kbl(x = sector_avg, booktabs = T,
col.names = c("Sector", "% increase in mens wages compared to womens"),
linesep = " ", align = "c",
caption = "Table to show the sectors with, on average, the largest
percent gap in men’s wages compared to women’s wages.") %>%
kable_styling(latex_options = "striped") %>%
column_spec(column = 2,
color = spec_color(sector_avg$percent_increase_in_mens_wages_compared_to_womens,
option = "A",
end = 0.8))| Sector | % increase in mens wages compared to womens |
|---|---|
| primary | 0.2835 |
| secondary | 0.2494 |
| education | 0.2383 |
| financial | 0.2112 |
| construction | 0.1978 |
| technology | 0.1932 |
| information | 0.1892 |
| head | 0.1640 |
| offices | 0.1640 |
| technical | 0.1615 |
This second example uses the code from the
sector_averages.R script, then we display the output using
kableExtra. I’ve included the code here but you can use
chunk options to stop the code showing, such as using
echo=FALSE. I’ve also included message=FALSE
and warning=FALSE in as chunk options so we don’t get
loading messages and such.
# load libraries
library(tidytext)
library(tidyverse)
# read in the data
paygap <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-06-28/paygap.csv')
uk_sic_codes <- read_csv("https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/527619/SIC07_CH_condensed_list_en.csv") %>%
janitor::clean_names()
# clean the sic codes in pay gap, then join with our sic codes data
paygap_joined <- paygap %>%
select(employer_name, diff_median_hourly_percent, sic_codes) %>%
separate_rows(sic_codes, sep = ":") %>%
left_join(uk_sic_codes, by = c("sic_codes" = "sic_code"))
# tokenise, and remove stop words
paygap_token <- paygap_joined %>%
unnest_tokens(word, description) %>%
anti_join(get_stopwords()) %>%
na.omit()
# count tokens and pull out top 50
top_words <- paygap_token %>%
count(word) %>%
filter(!word %in% c("activities", "n.e.c", "general", "non")) %>%
slice_max(n, n = 50) %>%
pull(word)
# calculate wage differences
paygap_final <- paygap_token %>%
filter(word %in% top_words) %>%
transmute(
diff_wage = diff_median_hourly_percent / 100,
word) %>%
group_by(word) %>%
summarise(diff_wage = round(mean(diff_wage), digits = 4)) %>%
arrange(desc(diff_wage)) %>%
rename(sector = word, percent_increase_in_mens_wages_compared_to_womens = diff_wage)
# just the top 10
paygap_top <- paygap_final %>% slice_head(n = 10)
# make kableExtra table
kbl(x = paygap_top, booktabs = T,
col.names = c("Sector", "% increase in mens wages compared to womens"),
linesep = " ", align = "c",
caption = "Table to show the sectors with, on average, the largest
percent gap in men’s wages compared to women’s wages.") %>%
kable_styling(latex_options = "striped") %>%
column_spec(column = 2,
color = spec_color(sector_avg$percent_increase_in_mens_wages_compared_to_womens,
option = "A",
end = 0.8))| Sector | % increase in mens wages compared to womens |
|---|---|
| primary | 0.2835 |
| secondary | 0.2494 |
| education | 0.2383 |
| financial | 0.2112 |
| construction | 0.1978 |
| technology | 0.1932 |
| information | 0.1892 |
| head | 0.1640 |
| offices | 0.1640 |
| technical | 0.1615 |
I should also note that the above idea of including your scripts in
your document also applies to the paygap.png image we added
which is generated from the paygap.R script.
We are almost there, just the discussion left. Two new features here are a quote and a list.
Wow, we’ve just taken a pdf document and converted it to R Markdown!
Last but not least you might be interested in either adding in or calculating your word count. This is not the most straightforward in R Markdown, but there is a nice workaround.
Copy and paste this code and add it into your document. Make sure it is not in a code cell. This is known as inline code and produces the following: Word count: 4531. You will likely get a different word count!
**Word count**: `r as.integer(sub("(\\d+).+$","\\1",system(sprintf("wc -w %s", knitr::current_input()),intern=TRUE)))-20`Have a go at the take home challenge, or set your own challenge like converting a report or document you’ve previously written into Quarto/R Markdown or LaTeX. You can try and use Quarto/R Markdown or LaTeX for reports you write this year which is great opportunity to learn.
You also might wonder if there are other options than LaTeX and Quarto/R Markdown. For example, what if you are not primarily an R user but want to get all the great features of Quarto/R Markdown?
The great news is Quarto is designed to be multi-lingual, so you can do what we did today but instead use Quarto with Jupyter Notebooks or VS Code instead, using Python code for any analysis you want to add.
These days you can also write Python as well as other code in RStudio, for example to use Python you use the reticulate package. There are also other options available to code in c++, JavaScript and more.
We should also mention that Jupyter Notebooks has the ability to write publication ready reports using the Jupyter book extension. If you generally prefer Jupyter Notebooks then it is worth having a look, but from my experience it is less user-friendly and flexible than R Markdown/Quarto. I’d recommend using Quarto with Jupyter Notebooks from within Visual Studio Code (VS Code) to get the best possible experience. Check out the VS Code Jupyter extension and how to use it with Quarto; you can also do this with R and Julia.
Write your CV using R Markdown with the vitae package! This is a very convenient way of writing and keeping your CV up to date as it is data driven. What this basically means is you keep all the information about your work, skills, experiences and so on in an Excel file (or Google sheets), which you load into R Markdown to create a cool CV.
There are lots of good examples on the vitae GitHub page, but two I’d recommend are from Lorena Abad and Shazia Ruybal-Pesántez.
My suggestion on getting started is the following:
install.packages('vitae')Code (green button) -> Download ZIP which will download
the files for youcv_data.xlsx. You can
change this so it has your experience and skills in itcv-vitae.Rmd file and change what you need,
regularly compiling (knitting) to see what it looks likeawesome-cv.cls file. You can copy the contents of the file
on her GitHub and paste it into your local copy of the
awesome-cv.cls file. Knit your report to see the
difference.\faIcon{briefcase} Professional Experience. Try and add
some of these into your CV. hint: You’ll likely need the fontawesome
package for this: https://github.com/rstudio/fontawesome